Entropy and Information in Neural Spike Trains 
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The nervous system represents time dependent signals in sequences of discrete action potentials 
or spikes; all spikes are identical so that information is carried only in the spike arrival times. We 
show how to quantify this information, in bits, free from any assumptions about which features of 
the spike train or input signal are most important, and we apply this approach to the analysis of 
experiments on a motion sensitive neuron in the fly visual system. This neuron transmits information 
about the visual stimulus, at rates of up to 90 bits/s, within a factor of two of the physical limit set 
by the entropy of the spike train itself. 
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As you read this text, optical signals reaching your 
retina are encoded into sequences of identical pulses, 
termed action potentials or spikes, that propagate along 
the ~ 10 6 fibers of the optic nerve from eye to brain. This 
spike encoding appears almost universal, occurring in an- 
imals as diverse as worms and humans, and spanning all 
the sensory modalities [10. The molecular mechanisms 
for the generation and propagation of action potentials 
are well understood Q , as are the mathematical reasons 
for the selection of stereotyped pulses by the dynamics 
of the nerve cell membrane g]. Less well understood is 
the function of these spikes as a code Q: How do the 
sequences of spikes represent the sensory world, and how 
much information is conveyed in this representation? 

The temporal sequence of spikes provides a large ca- 
pacity for transmitting information, as emphasized by 
MacKay and McCulloch 45 years ago ||. One central 
question in studies of the nervous system is whether the 
brain takes advantage of this large capacity, or whether 
variations in spike timing represent noise which must be 
averaged away . In response to a long sample of time 
varying stimuli, the spike train of a single neuron varies, 
and we can quantify this variability by the entropy per 
unit time of the spike train, <S(Ar) M, which depends on 
the time resolution At with which we record the spike ar- 
rival times. If we repeat the same time dependent stimu- 
lus, we see a similar, but not precisely identical, sequence 
of spikes (Fig. 1). This variability at fixed input can also 
be quantified by an entropy, which we call the conditional 
or noise entropy per unit time AA(Ar). The information 
that the spike train provides about the stimulus is the 
difference between the total spike train entropy and this 
conditional entropy, i?i n fo = S — Af [fjj. Because the 
noise entropy is positive (semi)dcfmitc, the entropy rate 
sets the capacity for transmitting information, and we 
can define an efficiency e(Ar) = i?i n f (Ar)/<S(Ar) with 
which this capacity is used || . The question of whether 
spike timing is important is really the question of whether 
this efficiency is high at small Ar j|. 

For some neurons, we understand enough about what 
the spike train represents that direct "decoding" of the 



spike train is possible; the information extracted by these 
decoding methods can be more than half of the total 
spike train entropy with At ~ 1 ms B. The idea that 
sensory neurons provide a maximally efficient represen- 
tation of the outside world has also been suggested as an 
optimization principle from which many features of these 
cells' responses can be derived Q. But, particularly in 
the central nervous system ||, assumptions about what 
is being encoded should be viewed with caution. The goal 
of this paper is, therefore, to give a completely model in- 
dependent estimate of entropy and information in neural 
spike trains as they encode dynamic signals. 

We begin by discretizing the spike train into time bins 
of size At, and examining segments of the spike train 
in windows of length T, so that each possible neural re- 
sponse is a "word" with T/At symbols. Let us call the 
normalized count of the i th word pi, and then the "naive 
estimate" of the entropy is 



SnaivcCT, At; size) = - log 2 ft, 



(1) 



where the notation reminds us that our estimate of the 
entropy depends on the size of the data set we use in 
accumulating the histogram. The true entropy is 
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and we are interested in the entropy rate 
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The difficutly is that, especially at large T, very large 
data sets are required to ensure convergence of Starve to 
the true entropy S. 

Imagine that we have a spike train with mean spike 
rate f ~ 40 spikes/s and we sample with a time reso- 
lution At = 3 ms. In a window of T = 100 ms, the 
maximum entropy consistent with this mean rate is 
S ~ 17.8 bits, and we will see that the entropy of real 
spike trains is not far from this bound. But then there 
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are roughly 2 s ~ 2x 10 5 words with significant pi, and if 
our naive estimation procedure is going to work, we need 
to have at least one sample of each word. If our sam- 
ples come from nonoverlapping 100 ms windows, then we 
need more than three hours of data, and one might think 
that we need much more data than this to insure that 
the probability of each word is estimated with reason- 
able accuracy. Such large quantities of data are generally 
inaccessible for experiments on real neurons. 

Here we report that it is possible to make progress 
despite these pessimistic estimates. First, we examine 
explicitly the dependence of our entropy estimates on 
the size of the data set and find regular behaviors |lC| ] 
that can be extrapolated to the infinite data limit. Sec- 
ond, we evaluate robust upper (7j and lower bounds 
on the entropy which serve as a check on our extrapola- 
tion procedure. Third, we are interested in the extensive 
component of the entropy in large time windows, and we 
find that a clean approach to extensivity is visible be- 
fore sampling problems set in. Finally, for the neuron 
studied — the motion sensitive neuron HI in the fly's vi- 
sual system — where we can actually collect many hours 
of data. 
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and fewer spikes for an outward motion; vertical motions 
have no effect on this cell, but are coded by other neu- 
rons These cells provide visual feedback for flight 
control. In the experiments analyzed here (f| , the fly is 
immobilized and views computer generated images on a 
display oscilloscope. For simplicity these images consist 
of a fixed pattern of vertical stripes with randomly cho- 
sen grey levels, and this pattern takes a random walk in 
the horizontal direction Q]. 
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FIG. 2. The frequency of occurence for different "words" 
in the spike train, with At = 3 ms and T = 30 ms. Words 
are placed in order so that the histogram is monotonically de- 
creasing; at this value of T the most likely word corresponds 
to no spikes. Inset shows the dependence of the entropy, com- 
puted from this histogram according to Eq. (1) on the fraction 
of data included in the analysis. Also plotted is a least squares 
fit to the form S — So + Si /size + S^/size 2 . The intercept 
So is our extrapolation to the true value of the entropy with 
infinite data [10]. 
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FIG. 1. (A) Raw voltage records from a tungsten micro- 
electrode near the cell HI are filtered and discretized. (B) 
Angular velocity of a pattern moving across the fly's visual 
field produces a sequence of spikes in HI, indicated by dots. 
Repeated presentations produce slightly different spike se- 
quences. For experimental methods see Ref. [13]. 

HI responds to motion across the entire visual field, 
producing more spikes for an inward horizontal motion 



We begin our analysis with time bins of size At = 3 
ms. For a window of T — 30 ms — corresponding to the 
behavioral response time of the fly |l5| — we can estimate 
the entropy rather accurately by the naive procedure de- 
scribed above. Figure 2 shows the histogram {pi}, and 
the naive entropy estimates. We see that there are very 
small finite data set corrections (< 10~ 3 ), well fit by |l(]] 

S nalve (T, At; size) = S(T, At) + Si ^ At ^ 

size 

+ S ^f>. (4) 
size 

Under these conditions we feel confident that the extrap- 
olated S(T, At) is the correct entropy. For sufficiently 
large T, sampling problems occur: finite size corrections 
become much larger, the contribution of the second cor- 
rection is significant and the extrapolation to infinite size 
is unreliable. 

Ma |ll| discussed the problem of entropy estimation 
in the undcrsamplcd limit. For probability distributions 
that are uniform on a set of N bins (as in the microcanon- 
ical ensemble) , the entropy is log 2 N and the problem is 
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to estimate N. Ma noted that this could be done by 
counting the number of times that two randomly cho- 
sen observations yield the same configuration, since the 
probability of such a coincidence is 1/N. More generally, 
the probability of a coincidence is P c = J2Pi > an d hence 



S = - Pi l0 §2 Pi = - ( lo §2 Pi) 

> -log 2 ((p,-)) = -log 2 P c , 



(5) 



u estimate (total entropy rate 
□ bound (total entropy rate) 
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so we can compute a lower bound the 
counting coincidences. Furthermore, as 



the entropy by 
emphasized by 
Ma, log 2 P c is less sensitive to sampling errors than is 

entropy consis- 
Renyi entropies 
for the analysis 



Snaive- The Ma bound is the minimum 
tent with a given P c , and it is one of the 
|l6f . It is also at the heart of algorithms 
of attractors in dynamical systems |L7| . 



The Ma bound is tightest for distributions that are 
close to uniform. The distributions of neural responses 
cannot be uniform because the spikes are sparse. But the 
distribution of words with fixed spike count, -/V sp , is more 
nearly uniform, so we apply the Ma bounding procedure 
in each N sp sector. Thus S > Smb., with 



x log 2 



2n c (N sp ) 



N obs (N sp )[N ohs (N sp ) - 1] 



, (6) 



where n c (N ap ) is the number of coincidences observed 
among the words with N sp spikes, iV bs(^Vsp) is the to- 
tal number of occurrences of words with N sp spikes, and 
P(N sp ) is the fraction of words with N sp spikes. 

In Fig. 3 we plot the entropy as a function of the win- 
dow size T, with results both from the naive procedure 
and from the Ma bound. For sufficiently large windows 
the naive procedure gives an answer smaller than the Ma 
bound, and hence the naive answer must be unreliable 
because it is more sensitive to sampling problems. Be- 
fore this sampling disaster the lower bound and the naive 
estimate are never more than 10-15% apart. The point 
at which the naive estimate crashes into the Ma bound is 
also where the second correction in Eq. (4) becomes sig- 
nificant and we lose control over the extrapolation to the 
infinite data limit. This occurs at T ~ 100 ms. We can 
trust the Ma bound beyond this point, but it becomes 
steadily less powerful. 
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FIG. 3. The total and noise entropies per unit time (in bits 
per second) are plotted versus the reciprocal of the window 
size (in s _1 ), with the time resolution held fixed at At = 3 
ms. Results are given both for the direct estimate and for the 
bounding procedure described in the text, and for each data 
point we apply the extrapolation procedures of Fig. 2 (inset). 
Dashed lines indicate extrapolations to infinite word length, 
as discussed in the text, and arrows indicate upper bounds 
obtained by differentiating S(T) [7]. 

If the correlations in the spike train have finite range, 
then the leading subextensive contribution to the entropy 
will be a constant C(Ar). This means that 



S(T, At) 
T 



5(Ar) 



C(At) 
T 



(7) 



This asymptotic behavior is seen clearly in Fig. 3, and 
emerges before the sampling disaster. Given the clean 
linear behavior in a well sampled region of the plot, we 
trust the extrapolation and arrive at an estimate of the 
entropy per unit time, 5(Ar = 3 ms) = 157 ± 3 bits/s. 

The entropy approaches its extensive limit from above 
@ , so that 



[S(T + At, At) - S(T, At)} > S(At) 



(8) 



for all window sizes T. This bound becomes progres- 
sively tighter at larger T, until sampling problems set in. 
In fact there is a broad plateau (±2.7%) in the range 
18 < T < 60 ms, leading to S < 157 ± 4 bits/s, in excel- 
lent agreement with the extrapolation in Fig. 3. 

The noise entropy per unit time A/"(Ar) measures the 
variability of the spike train when the input signals are 
held fixed. Hence we need to look at the ensemble of re- 
sponses to repeated presentations of the same time vary- 
ing input signal. Marking a particular time t relative to 
the stimulus, we accumulate the frequencies of occurrence 
of each word i that begins at t, and call this histogram 
Pi(t). This generates a naive estimate of the local noise 
entropy in the window from t to t + T, 



N, 



local 



(t, T, At; size) 



Pi(t) log 2 pi(t). 



(9) 
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Estimating the average rate of information trans- 
mission by the spike train requires knowing the 
noise entropy averaged over t, N aa i ve (T, At; size) = 
(N^^l(t;T, At, size)) t . Then we analyze as before the 
extrapolation to large data set size and large T. Fig. 
3 also shows the noise entropy results as a function 
of the window size. The difference between the two 
entropies is the information which the cell transmits, 
i?i nfo (Ar = 3ms) = 78 ±5 bits/s, or 1.8 ±0.1 bits/spike. 

This anlysis has been carried out over a range of time 
resolutions, 800 > At > 2 ms, and the results are sum- 
marized in Fig. 4. Over this range, the entropy per unit 
time of the spike train varies over a factor of roughly 40, 
illustrating the increasing capacity of the system to con- 
vey information by making use of spike timing. Note that 
At = 800 ms corresponds to counting spikes in bins that 
contain typically thirty spikes, while At — 2 ms corre- 
sponds to timing each spike to within 5% of the typical 
interspike interval. The information that the spike train 
conveys about the dynamics of motion across the visual 
held increases in approximate proportion to the entropy, 
corresponding to ~ 50% efficiency, at least for this en- 
semble of stimuli. 



total entropy rate (bits/s) 

FIG. 4. Information and entropy rates computed at various 
time resolutions, from At = 800 ms (lower left) to At = 2 ms 
(upper right). Note the approximately constant, high effi- 
ciency across the wide range of entropies. 

Although we understand a good deal about the signals 
represented in HI |F2||l8| ], our present analysis does not 
hinge on this knowledge. Similarly, although it is possi- 
ble to collect very large data sets from HI, Fig.'s 2 and 
3 suggest that more limited data sets would compromise 
our conclusions only slightly. It is feasible, then, to apply 
these same analysis techniques to cells in the mammalian 
brain [ ^9j . Like cells in the monkey or cat primary vi- 
sual cortex, HI is four layers 'back' from the array of 
photodetectors and receives its inputs from thousands of 
synapses. For this central neuron, half the available in- 
formation capacity is used, down to millisecond precision. 
Thus, the analysis developed here allows us for the first 
time to demonstrate directly the significance of spike tim- 



ing in the nervous system without any hypotheses about 
the structure of the neural code. 
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